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We consider feed-forward neural networks with one hidden layer, tree architecture and a fixed 
hidden-to-output Boolean function. Focusing on the saturation limit of the storage problem the 
influence of replica symmetry breaking on the distribution of local fields at the hidden units is 
investigated. These field distributions determine the probability for finding a specific activation 
pattern of the hidden units as well as the corresponding correlation coefficients and therefore quantify 
the division of labor among the hidden units. We find that although modifying the storage capacity 
and the distribution of local fields markedly replica symmetry breaking has only a minor effect on 
the correlation coefficients. Detailed numerical results are provided for the PARITY, COMMITTEE 
and AND machines with K=3 hidden units and nonoverlapping receptive fields. 
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I. INTRODUCTION 
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Multilayer neural networks (MLN) are more powerful devices for information processing than the single-layer 
i-rt \ perceptron because of the possibility of different activation patterns, so-called internal representations (IR) , at the 
hidden units for the same input-output mapping. It is well known that the correlations between the activities at 
the hidden units are crucial for the understanding of the storage and generalization properties of a MLN [p]-pl . A 
particular simple situation to study these correlations is the implementation of random input-output mappings by 
the network, the so-called storage problem, near the storage capacity. Using the replica trick and assuming replica 
t-H , symmetry the correlation coefficients building up in this case were calculated in M and shown to be characteristic 
for the prewired Boolean function between hidden layer and output. Conversely, prescribing these correlations the 
storage properties of the networks change M . 

The assumption of replica symmetry (RS) in this calculation is somewhat doubtful. In fact it is well known that 
the storage capacity of MLN is strongly modified by replica symmetry breaking (RSB) pHUj, which is due to the 
very possibility of different internal representations. Moreover, even the distribution of the output field of a simple 
Q\ ' perceptron is influenced by RSB effects pd| , p^ |. 

^\ , In the present paper we elucidate the impact of RSB on the correlation coefficients between the activity of different 

^^ ■ hidden units in MLN with one hidden layer and nonoverlapping receptive fields. The central quantity of interest is 
the joint probability distribution for the local fields at the hidden units. In the general part of this paper we show how 
this distribution can be calculated both in RS and in one-step RSB. For a detailed analysis we than specialize to MLN 
with K = 3 hidden units and discuss, in particular, the PARITY, COMMITTEE and AND machines. Together with 
the corrections from one-step RSB the RS results give insight in the division of labor between different subperceptrons 
in MLN and the role of RSB. Calculating finally the correlation coefficients we find that although modifying the local 



U , field distribution markedly RSB gives rise to minor corrections to the correlation coefficients only. 



II. GENERAL RESULTS 

We consider feed- forward neural networks with N inputs ££ , one hidden layer of K units t\ , T2, • • • , tr and a 
single output a. The hidden units have nonoverlapping receptive fields of dimension N/K (tree structure). They 
are determined by the inputs via spherical coupling vectors J& € IR ' , J| = N/K according to t^ = sgn(hk) 
with hk — Jk£,k\/K/N denoting the local fields. We call an activation pattern (t£',t| / , . . . ,t^) of the hidden units 
an internal representation (IR). The output a of the MLN is a fixed Boolean function a = F(ti, ■ ■ ■ ,tk) of the 
IR. Examples of special interest include the PARITY machine, F({t%}) = nf=i r fe' tne COMMITTEE machine, 
F{{t»}) = sgn{J2k=i T k )> and thc AND machine, F = +1 if all r fc = +1; else F = -1. 



All IR consistent with a desired output are called legal internal representations (LIR) . The number of and similarity 
between LIR to a given output specifies the division of labor taking place between the different perceptrons forming 
the MLN. It is quantitatively characterized by the correlation coefficients 

C„ = (((JT^T^ ■■■T. i J), (1) 

n = 1, . . . , K, where ((• • •)) denotes the average over the inputs and the output and i\, . . . ,i n is a subset of n natural 
numbers between 1 and K. For permutation symmetric Boolean functions, the c n only depend on n and not on the 
particular choice of this subset. 

We focus on the so-called storage problem in which the inputs ££ and the outputs o v are generated independently 
at random according to the probability distributions 

5{g»-l) + 8{a>' + l) 
P(<? ) = g ( > 

and 

p(^) = ^exp(-i(£L) 2 ), (3) 

where k — 1,. , . ,K, i = 1, . . . , N/K and v — 1, . . . , aN. 

The basic quantity which gives us access to the probability of the LIR and to the correlation coefficients is the 
distribution p(hj) of the local fields hj at the jth hidden unit. It is given by 



P(hj) = ((i [f[ du(J k )d„(\Z) f[ 9 (^(sgnW), ■ • • ,sgn(A^))) 5 (h 3 - A)) \\ . (4) 
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((■ ■ •)) denotes the average over all stored input-output patterns. Z denotes the partition function 

K aN 



Z = f f[ dn{3 k )dn{\l) J] 9 KF(sgn(Aa ■ • • , sgn(A^))) , (5) 

•* 1, — i ,, — 1 
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dfj,(3k) the measure on the Gardner sphere (l| 

N\ dJ 



'//"J/,> = 'MJ*-^J /Z% /K > ((,) 



and d/i(A^) the integration measure 



dKK)=sU%-3k^§)dX%. (7) 

We use the replica trick 1/Z = limn^o-2™ -1 in Eq. (|J) to perform the average over the inputs {££!} and introduce 
the overlaps q% b = 3l3 b : /(N/K) between different replicas a, b of a coupling vector 3 k of hidden unit fc. We will 
consider only permutation symmetric Booleans F. Hence all hidden units have the same statistical properties implying 
p{hk) = p(h) and q£ b — q ab with k = 1 . . . K , Equation ((J) takes on the form 

P (h 3 ) = Urn [l[dq ab «p(&»))<, exp (ylndet(Q) + (aN - l)((lnGi(Q| CT ))) ff > ) (8) 

""* J a<b ^ ' 

in terms of the (n x n)-dimensional order parameter matrix Q where Q aa = 1 and Q ab = q ab . Here 

*w=/n^f »p E[»K4«)i - e * a A a 

k,a \ k,a k.a<b 

xl[e(aF({ S gn(\l)}))6(h 3 -\{), (9) 



and the expression for Gi(Q\a) is specified in the Appendix, Eq. (A.6), together with some more details of the 
calculation. 

In the limit N — > oo the integral (g) is dominated by the saddle point values of the order parameters q ab which 
extremize the partition function 



ilndet(Q)+a((lnGi(Q|<7)» 6 



Z = exp I N extT q a b i lim ^ — x ' '" \ 1 . (10) 

In the following, we simplify Eqs. (H) and (10) using the assumption that the order parameter matrix Q is either 
replica symmetric or describes one-step replica symmetry breaking. We will always consider the saturation limit 
a — ► a c since the expressions then simplify and the correlations become most characteristic in this limit. The RS case 
is specified by JL4J 

•--GIL"' (11) 

The saturation limit a — > a c is characterized by the existence of a unique solution J/., e.g., q — > 1. We then get 

p(hW) = [f\Dy k lim f CXP ('^ ( " + ^^ )2/(1 ' g) ) ^H.(a|^ lf n W )\ (12) 



fc=l 



V2tt(1 - q) ^lir(o-) 



for the conditional probability to find a specific value /i of the postsynaptical potential under the constraint of a given 
output a. The terms abbreviated by 
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$LIB.(0-Ki,s g n(/o) := 51 ^i,sgn(ft) 5 <r,F( m ,...,^) II ^( ^ yfc V/ 737 ) ' ( 13 ) 

all sets (171,..., 77k) fc— 2 ' 

$lir(<t):= 51 ^,F(m,-,^) II ^(^^Vi?^) ( 14 ) 

all sets (171,..., J7k) fe=l ' 

ensure that only LIR for the respective value of a contribute to the sum in Eq. dl3) . As usual we have used the error 
function H(x) = J x °° Dt with Dt = exp(-t 2 /2)dt/V^- 

Let us now turn to main features of the solution within the ansatz of one-step RSB. Then the following form for 
the order parameter matrix is assumed ]l4j : 

( 1 if a = b 
q ab = ^qi if |o-ft| <m (15) 

[q else. 

Accordingly there are two overlap scales characterizing the similarity between coupling vectors belonging to the same 
and different regions of the solution space, respectively. 

Using this ansatz we find after standard manipulations J14J for the probability distribution of the local field for a 
specific output a 
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1 / {h + yiy/qo + z ly /qi -qa)\ $lir (v\8 m , SS n(h)) 

fc=i 
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P(h\a) = / J] ^— V ^ ^ V* — 7 , (16) 
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fe=i 
where now 
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- / ,. \ \r^ . x n ff /' Vky/qo + z ky / qi -q a \ 

®LlR{<T\d m ,sgn(h)) ■= 2.^ °Vusgn(h) d a,F( m ,...,r, K ) [\_ H \ Vk , _ I, (17) 



all sets (771,..., r/if) fc— 2 
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$lir(<7) := }^ °o,F( Vl ,...,r, K ) [I H[Vk— 7j== I- ( 18 ) 



all sets (771 ,...,77k) fe=l 



These expressions simplify in the saturation limit a — > a c in which one finds q\ — > 1 and m = w(l — (ft) — ► 0. The 
remaining order parameters w, qo are given by the saddle point equations corresponding to the following expression 
for the storage capacity a c : 



a c = mm 



ln[l + id(1 - g )] + q w/[l + w(l - q )] 



2 Um // J J] £>J/fc ln { / II Dz k (*lir(o-)) 



(19) 



As in the RS case the analytical and numerical analysis of these expressions for concrete situations needs some care 
(see next section). 

To finally obtain p(h) we must average Eqs. (^2() and ([U}) over the two possible outputs a = ±1, 

(20) 



p(h) = {{p(h\a))) a . 
From this probability distribution we find the distributions p(ri, . . . , tk) of the LIR according to 

K 



p(n, . . . , tk)= j \\ dh k Q(T k hk) p{hk)- 

"L fe=i 



(21) 



The correlation coefficients c n , n — 1, . . . ,K, are then given by 



Cn= ^2 < J VlV2---r/n5a,F( m , m ,..., VK )P(mi r ]2,---,r)K)- 

all sets (771 ,...,?7k) 



(22) 



The Kronecker S in Eq. (E2J) restricts the sum to all LIR of the output a. Equation (E2J) is valid as long as the pattern 
load of the MLN does not exceed its saturation threshold a c . 



III. SPECIFIC EXAMPLES WITH K = 3 HIDDEN UNITS 

In this section we apply the general formalism developed above to the analysis of simple versions of three popular 
examples of MLN, namely, the PARITY, COMMITTEE and AND machines, each with K = 3 hidden units. We start 
with the RS results. 



A. Replica symmetry 



In COMMITTEE and PARITY machines there is for every LIR of output a — +1 an IR with all signs reversed that 
realizes output a = — 1. Therefore p(h) = p(h\ + 1) = p(h\ — 1) and the final average over a in Eq. ( p(i| ) is trivial. 
Analyzing Eqs. ( ^3| ) and (14) in the limit q — ► 1 one realizes that they depend on both the sign and values of all 
integration variables yt- Expression (113) as well as Eq. ( |l4| ) are either equal to one or exponentially small in some 
or all integration variables. The quotient of both figuring in Eq. ( J12| ) can hence become one, zero, or singular with 
respect to y\. Whenever it is one the integral in Eq. ( |l2|) gives rise to 6(h + y{) for q — > 1. Whenever the quotient is 
singular a contribution 5(h) results. 

Keeping track of the different contributions arising in this way we find for the K = 3 COMMITTEE machine 



P (h) = e(-h) 



and for the PARITY machine 



p(h) 



1 e~" 2 / 2 

2 \/2^ 






4 / Dt ff(t) + —8-{h) + —S+(h) 



2n 



12 



12 



(23) 



(24) 



Note that p(h) for the PARITY machine is an even function due to the additional symmetry of the Boolean function 
F for this case. 



In the AND machine the output a = +1 can be realized by one LIR only whereas the output a = — 1 results from 
all the remaining 2^ — 1 IR. Hence p(h\ + 1) and p(h\ — 1) differ significantly. In fact we find for the K — 3 AND 
machine 

P (h\ + 1) = e(/o^-=- + h+(h), (25) 



./2tt 2 
p(ft| - 1) = ©C-^^ 1 ^ + irS.(h) + 6(^)^(1 - ff 2 W), (26) 



2tt 24 V2^ 

and p(h) = \p{h\ + 1) + p(h\ — l)]/2. Note that we have introduced two different singular contributions S-(h) and 
S + (h) in Eqs. (|3|), (gj) and Eqs. (p5[), (|26[). The reason for this is that the weight of 5+(h) adds to the probability of 
positive local fields whereas the weight of S- (h) adds to that of negative local fields. This distinction will be important 
later when calculating the correlation coefficients from p(h) (cf. Eq. (El])). The results (E3I) , (p3) and (pq), (p9) are 
shown as the dashed lines in Figs. ||-g respectively. 

These RS results are in fact very intuitive and can be even quantitatively understood by assuming that the outcome 
of a Gardner calculation corresponds to the result of a learning process in which the initially wrong IR are eliminated 
with least adjustment 0. Due to the permutation symmetry between the hidden units we may consider only the 
local field hi of the first unit of the hidden layer. Before learning the couplings J^ are uncorrelated with the patterns 
and the local field h\ is consequently Gaussian distributed with zero mean and unit variance. 

Now consider, e.g., the PARITY machine. Due to the discussed symmetries it is sufficient to analyze the case a = +1 
and hi > 0. If h 2 and h 3 are equal in sign, which will occur with probability 1/2, there is no need to modify the 
couplings at all. This gives rise to the first term in Eq. (|2J) which is just the original Gaussian and describes the 
chance that a randomly found IR with hi > is legal. If h 2 and h^ differ in sign the IR is illegal and the couplings 
Jfc have to be modified until one of the hidden units changes sign. In an optimal learning scenario the local field 
with the smallest magnitude would be selected and the corresponding coupling vector would be modified such that 
the field just barely changes sign. Hence hi remains still unmodified if either hi or /13 is smaller in absolute value 
which gives rise to the second term in Eq. (|24|). Finally, if really hi is selected for the sign change, which will happen 
with probability 1/6 for symmetry reasons, it will after learning be either slightly smaller or slightly larger than zero, 
which is the origin of the last two terms in Eq. ( p4[ ) . 

With a similar reasoning it is possible to rederive the RS result for the COMMITTEE machine. Again it is sufficient 
to consider the case a = +1. If hi > initially it will not be modified, which gives rise to the last term in Eq. (|23|). 
If, on the other hand, hi < 0, prior to learning it will not be modified only if both h 2 and /13 are either positive from 
the start or easier to make positive than hi- Hence a negative hi survives the learning process if the other two fields 
are both larger. This is described by the first term in Eq. (g3|). Finally, with probability 5/24 we find that hi < 
and either h 2 or /13 is even smaller than hi and therefore harder to correct. In this case the learning would shift hi to 
positive values as described by the second term in Eq. (£3|) . The resulting distribution of local fields will hence have 
a dip for negative values of small absolute value clearly visible in Fig. pi 

The case of the AND machine is the simplest. The output a = +1 requires all local fields to be positive. Hence 
positive fields are not modified, negative ones are shifted to + resulting immediately in Eq. ( p5[ ) which is, of course, 
identical to the result for the single-layer perceptron [J15|,[l6) . In the case of a negative output a = — 1 only the IR 
(+, +, +) is illegal and must be eliminated which is again done by changing the sign of the smallest field. This gives 
rise to Eq. (p6|). 

It is finally interesting to compare the distribution of local fields found above with that for a single perceptron above 
saturation [|l7|,[ll]]. The individual perceptrons in a MLN certainly operate above their storage limit even when the 
storage capacity of the MLN is not yet reached. The most remarkable feature of the distribution of local fields for a 
perceptron above saturation minimizing the number of misclassificd inputs is a gap separating positive from negative 
values. Being intimately related to the failure of any finite level of RSB for this problem this gap is believed to exist 
even in the solution with continuous RSB |0|. On the other hand, none of the distributions for MLN showed a gap. 

As should be clear from the above qualitative discussion the reason for this is quite simple. The single perceptron 
above saturation has to reject some inputs as not correctly classifiable. In order to keep the number of these errors 
smallest it chooses those with negative fields of large absolute value. Inputs with initially only slightly negative local 
fields will be learned whereby their local fields shift to values just above zero. In this way the gap occurs. In MLN, 
on the other hand, there is no reason to shift all negative local fields of small absolute value because the correct 
output may be realized by the other hidden units. Therefore one will not find an interval of h values for which p(h) 
is strictly zero. On the other hand, the tendency that predominantly fields of small absolute value will be modified 
in the learning process is clearly shown by the dips of the distribution functions around h = (cf. Figs. [U|3j). 



B. Replica symmetry breaking 

Let us now discuss how the above results get modified by RSB. The analytical and subsequent numerical analysis 
of Eqs. (J16- 19) for the K = 3 machines under consideration needs some care in order not to miss the various singular 



contributions. We have first to determine the values of the order parameters at the saddle point using Eq. (J19). In 
the saturation limit q\ — ► 1, ($LiR,(c)) m is dominated by one specific LIR which is selected among all other LIR by 
the sign and absolute value of the compound variables Vk — Vky/qo + Zk\Jq\ — qo- (^lir^c))™ either tends to 1 or 
becomes exponentially small in one or more compound variables Vk- Transforming the integration from Zk space to 
Vk space allows us to reduce the K-io\d z integral to a one-dimensional integral. This is performed numerically by 
Rhomberg integration whereas the outer y^ integrals are done using Gauss-Legendre quadrature Jig] . 

The saddle point equation (|19| ) is solved with a standard minimization routine (Powells method in two dimensions 
p8[). The values we get for the order parameters and for the storage capacity are consistent with those obtained 
earlier. For the K = 3 PARITY machine we find qo = 0, w ~ 67.2, and c^} SB ~ 5 in agreement with g. In the case 
of the K — 3 COMMITTEE machine we get q ~ 0.64, w ~ 21.2, and o^ SB ~ 3.14, a result somewhat larger than 
reported previously ||,[[o)- The K — 3 AND machine finally does not show RSB at all and we find accordingly go —> P 
w — > oo together with a^ ND = 1.31. 

In a second step, we use this values of the order parameters w, qo to calculate the respective distribution of local 
fields (16). The distribution functions p(h) obtained in this way are included as full lines in Figs. p]|?. Table [fl 



quantifies the main changes. The main modification of the distribution functions of local fields that occurs in one-step 
RSB is a redistribution of probability from the <5 peaks at h — ±0 to the continuous part of the distribution around 
zero resulting in a reduction of the weight of the singular parts of roughly 50%. This gives rise to a less pronounced 
dip of the distribution functions around h — and is qualitatively similar to the RSB modifications for a single 
perceptron above saturation O]. From the results for the PARITY machine it is conceivable that the central peak 
may get reduced further if higher orders of RSB are included and that it might eventually disappear completely in 
the full Parisi solution using continuous RSB. For all machines the probability of fields with large absolute values is 
hardly affected by RSB. 

For the AND machine we did not find RSB at all. The numerical solution of the saddle point equations only gave 
the RSB result qo — 1, w — > oo. We therefore suspect that replica symmetry is correct for the AND machine. This 
is also in accordance with the rule of thumb that RSB is necessary if the solution space is disconnected. In the AND 
machine the output a = +1 can be realized only by one LIR which clearly corresponds to a connected (even convex) 
solution space. The output a = — 1 is realized by all remaining IR, which as the complement of the previous solution 
space must be connected too. 

We have finally to clarify how much the modifications found for the distributions of local fields will change the 
probabilities of the internal representations and the correlation coefficients c„ depending only on the sign of the local 
fields. 

This question is, in fact, nontrivial only in the case of the COMMITTEE machine. For the AND machine no RSB 
occurs at all and for the PARITY machine the correlation coefficients are completely determined by the symmetry of 
the Boolean function F between hidden units and output. 

For the COMMITTEE machine we find that the probability of the LIR (+, +, +) is shifted from its RS value 0.1250 
to 0.1417, which is an increase by roughly 13% whereas the probability of the three remaining LIR (consisting of two 
pluses and one minus each) is reduced by 1.9% from 0.2917 to 0.2861. Qualitatively this means that more inputs 
are stored with the LIR (+, +, +) than the fraction 1/8 that had this LIR already by chance before learning. The 
learning process hence does not shift illegal IR just up to the decision boundary of the Boolean F but in some cases 
the correlations between inputs £ and couplings J neglected in RS allow even the safer LIR (+, +, +). 

Using Eq. ( p2| ) we can now also calculate the correlation coefficients and find that c\ increases by 2.7% from its 
RS value 5/12, C2 decreases in absolute value by 13.3% from its RS value -1/6 and C3 decreases in absolute value by 
4.5% from its RS value -3/4. This confirms the prediction of that although crucial for the storage capacity RSB 
will have only a minor influence on the correlation coefficients in MLN. 

IV. SUMMARY 

Generalizing the calculation of the distribution function of local fields for the single-layer perceptron we introduced 
a general formalism to determine the joint probability distribution p(h\, ..., hx) of local fields at the K hidden units 
of a two-layer neural network of tree architecture with fixed Boolean function between hidden layer and output both 
in replica symmetry and in one-step replica symmetry breaking. Explicit results were obtained for the PARITY, 
COMMITTEE, and AND machine with K — 3 hidden units in the saturation limit a — » a c . Although the individual 



perceptions are by far overloaded there is no gap in the distribution of local fields as known from a single perceptron 
above saturation. There is no RSB for the AND machine which we attribute to the connected solution space for 
this architecture. For the PARITY and COMMITTEE machine we find as a result of RSB a slight redistribution of 
probability from the singular parts at h = ±0 to the continuous part around the origin. The correlation coefficients 
c n characterizing the correlations between the legal internal representations are not modified by RSB for the PARITY 
machine since in this case they are fixed already by symmetries. For the COMMITTEE machine the changes of the 
correlation coefficients are rather small and the RS results derived in || may serve as useful approximations. 



APPENDIX: REPLICA CALCULATION 



In this appendix we give some more details on the calculation of the distribution function p(h) of the local fields at 
the hidden units following Gardner's approach |L5| . 

Introducing the replica trick 1/Z = lim rw o Z™ _1 into Eq. (|j) yields 



(h) = km // f ft dn(JtW(\Z> a ) II 6 (<^({sgn(Ar)})) 8 (h - \\ a 

W ** U « i, n 



(Al) 

k,a v,a I' {€fc}.o-" 

with replica index a = 1, ... ,71. In the integration measures (ft), (J7|) we replace the 6 functions by their integral form 



<Mra 2 



K 



d -§A-H^- N K 



(A2) 



s K' a - mi 



dx k 

2ir 



exp ix k ' X k - J k £ k 



(A3) 



We now perform the average over the Gaussian distributed patterns ^ i , i = 1, . . . , N / K and introduce the overlaps 

q k b = JfeJfc/(-^/^) °f different replicas of the same perceptron J& as well as its conjugated variable F k b . From 
the assumed permutation symmetry of the Boolean function F with respect to all hidden units we infer q k b = q ab , 

k = l, 

jpa p J„a6j fpab / at 

' rrf 2l(K/N) {{m<7))) ° ex P(y tr ^ h '" inX " lH(ln(V l K)k))), T .Yln<7 2 ( 1) ) (A4) 



pab = pab^ and E a _ pa for & U k = 1, . . . , K . This gives rise to the form 



J a J a<b 



where Q and A denote the symmetric matrices Q aa = 1, Q a ^ b — q ab and A aa — iE a , A a ^ b = —iF ab . Moreover, 

1 



K%)=/n^ ex p e 



k^k o\ x k) 



ixlX 



\[Q{aF{{^{X%)}))6{h-X\), 



E sfrkti* 



k,a<b 



(A5) 



* x k^k ~ 2^ X k) 



k,a \ k,a 

xJJe(crP({sgu(Ag)})), 

a 

a « A »-/n^-*(-5? i ^-4 

fc,a \ k;a,b J 



E ^ a6 



k,a<b 



(A6) 
(A7) 



In the limit A*" — > oo the integral in Eq. (A4) is dominated by the saddle point values of the order parameters E a , 
F ab , and q ab . Solving the saddle point equation with respect to E a and F ab yields A = Q^ 1 . Hence Eq. (A4) takes 
the form 



(h) = Urn / Y[ dq ab {( P (h\a))) a exp (y lndet(Q) + (aN - 1)((1iiGi(Q|ct))) ct 

n_ * ^ a<b ^ 



(A8) 



p(/i|cr) can be calculated by assuming either RS or one-step RSB for the matrix Q resulting in Eqs. (|l2l ) and (|lq), 
respectively. 

The remaining saddle point condition for the matrix q ab has in one-step RSB (|15|) the form 



extr„ 



go 



2[1 - q-L + m(qi - q )] 2m 



— In | I 



ra(gi ~ go) 



5MI-91) 




[]Dy fc lnJ fl[Dz k ($uR(o)) 

7, I •/ 7. 




(A9) 



It determines a set of order parameters {qi,qo,m} for every pattern load below the storage capacity a < a c . The 
abbreviation $lir(ct) is defined by Eq. (Uq). The angular brackets ((• • ■))„ indicate the average over the two possible 
outputs a = ±1. 
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TABLE I. Saturated K = 3 machines: Integrated features of the probability distribution p(h) of the local field. Corrections 
by one-step RSB are given in percent of the respective RS value. Dashes indicate that a respective singular contribution does 
not occur (COMMITTEE) or that we found no RSB (AND). 
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FIG. 1. Distribution of the local field h at the hidden units of a K = 3 COMMITTEE tree in one-step RSB (bold) and RS 
(dashed). 8+(h) is represented by adding its weight to the continuous part of the curve whereby for a better presentation the 
RS peak was shifted slightly to the right. 
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FIG. 2. Distribution of the local field h at the hidden units of a K = 3 PARITY tree in one-step RSB (bold) and RS 
(dashed). 5-(h) and 8+(h) are represented by adding their weights to the continuous part of the curve. 
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FIG. 3. Distribution p(h) of the local field h at the hidden units of a K — 3 AND tree in RS (left). The two panels to the 
right display its constituents p(h\a — +1) and p(h\a = —1) according to Eqs. (123) and (Urn. We found no RSB. 6 -(h) and 
8+ (h) are represented by adding their weights to the continuous part of the curve. 



